{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rLsMb4vqY244"
      },
      "source": [
        "Note: You can run this example right now in a Jupyter-style notebook, no setup required!  Just click \"Run in Google Colab\"\n",
        "\n",
        "<div class=\"devsite-table-wrapper\"><table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "<td><a target=\"_blank\" href=\"https://www.tensorflow.org/tfx/tutorials/model_analysis/tfma_basic\">\n",
        "<img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a></td>\n",
        "<td><a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/model_analysis/tfma_basic.ipynb\">\n",
        "<img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Run in Google Colab</a></td>\n",
        "<td><a target=\"_blank\" href=\"https://github.com/tensorflow/tfx/blob/master/docs/tutorials/model_analysis/tfma_basic.ipynb\">\n",
        "<img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">View source on GitHub</a></td>\n",
        "</table></div>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YuSYVbwEYNHw"
      },
      "source": [
        "# TensorFlow Model Analysis\n",
        "***An Example of a Key Component of TensorFlow Extended (TFX)***"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mPt5BHTwy_0F"
      },
      "source": [
        "[TensorFlow Model Analysis (TFMA)](https://www.tensorflow.org/tfx/guide/tfma) is a library for performing model evaluation across different slices of data. TFMA performs its computations in a distributed manner over large amounts of data using [Apache Beam](https://beam.apache.org/documentation/programming-guide/).\n",
        "\n",
        "This example colab notebook illustrates how  TFMA can be used to investigate and visualize the performance of a model with respect to characteristics of the dataset.  We'll use a model that we trained previously, and now you get to play with the results! The model we trained was for the [Chicago Taxi Example](https://github.com/tensorflow/tfx/tree/master/tfx/examples/chicago_taxi_pipeline), which uses the [Taxi Trips dataset](https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew) released by the City of Chicago. Explore the full dataset in the [BigQuery UI](https://bigquery.cloud.google.com/dataset/bigquery-public-data:chicago_taxi_trips).\n",
        "\n",
        "As a modeler and developer, think about how this data is used and the potential benefits and harm a model's predictions can cause. A model like this could reinforce societal biases and disparities. Is a feature relevant to the problem you want to solve or will it introduce bias? For more information, read about <a target='_blank' href='https://developers.google.com/machine-learning/fairness-overview/'>ML fairness</a>.\n",
        "\n",
        "Note: In order to understand TFMA and how it works with Apache Beam, you'll need to know a little bit about Apache Beam itself.  The <a target='_blank' href='https://beam.apache.org/documentation/programming-guide/'>Beam Programming Guide</a> is a great place to start."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Fnm6Mj3vTGLm"
      },
      "source": [
        "The columns in the dataset are:\n",
        "<table>\n",
        "<tr><td>pickup_community_area</td><td>fare</td><td>trip_start_month</td></tr>\n",
        "\n",
        "<tr><td>trip_start_hour</td><td>trip_start_day</td><td>trip_start_timestamp</td></tr>\n",
        "<tr><td>pickup_latitude</td><td>pickup_longitude</td><td>dropoff_latitude</td></tr>\n",
        "<tr><td>dropoff_longitude</td><td>trip_miles</td><td>pickup_census_tract</td></tr>\n",
        "<tr><td>dropoff_census_tract</td><td>payment_type</td><td>company</td></tr>\n",
        "<tr><td>trip_seconds</td><td>dropoff_community_area</td><td>tips</td></tr>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q7-ouHFnWAsu"
      },
      "source": [
        "## Install Jupyter Extensions\n",
        "Note: If running in a local Jupyter notebook, then these Jupyter extensions must be installed in the environment before running Jupyter.\n",
        "\n",
        "```bash\n",
        "jupyter nbextension enable --py widgetsnbextension --sys-prefix \n",
        "jupyter nbextension install --py --symlink tensorflow_model_analysis --sys-prefix \n",
        "jupyter nbextension enable --py tensorflow_model_analysis --sys-prefix \n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LZj-impiAD_l"
      },
      "source": [
        "## Install TensorFlow Model Analysis (TFMA)\n",
        "\n",
        "This will pull in all the dependencies, and will take a minute.\n",
        "\n",
        "**Note to ensure all the dependencies are installed properly, you may need to re-run this install step multiple times before there are no errors.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SA2E343NAMRF"
      },
      "outputs": [],
      "source": [
        "# This setup was tested with TF 2.3 and TFMA 0.24 (using colab), but it should\n",
        "# also work with the latest release.\n",
        "import sys\n",
        "\n",
        "# Confirm that we're using Python 3\n",
        "assert sys.version_info.major==3, 'This notebook must be run using Python 3.'\n",
        "\n",
        "print('Installing TensorFlow')\n",
        "import tensorflow as tf\n",
        "print('TF version: {}'.format(tf.__version__))\n",
        "\n",
        "print('Installing Tensorflow Model Analysis and Dependencies')\n",
        "!pip install -q tensorflow_model_analysis\n",
        "import apache_beam as beam\n",
        "print('Beam version: {}'.format(beam.__version__))\n",
        "import tensorflow_model_analysis as tfma\n",
        "print('TFMA version: {}'.format(tfma.__version__))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_aD7n5eECydb"
      },
      "source": [
        "**NOTE: The output above should be clear of errors before proceeding. Re-run the install if you are still seeing errors. Also, make sure to restart the runtime/kernel before moving to the next step.**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RptgLn2RYuK3"
      },
      "source": [
        "## Load The Files\n",
        "We'll download a tar file that has everything we need.  That includes:\n",
        "\n",
        "* Training and evaluation datasets\n",
        "* Data schema\n",
        "* Training and serving saved models (keras and estimator) and eval saved models (estimator)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "K4QXVIM7iglN"
      },
      "outputs": [],
      "source": [
        "# Download the tar file from GCP and extract it\n",
        "import io, os, tempfile\n",
        "TAR_NAME = 'saved_models-2.2'\n",
        "BASE_DIR = tempfile.mkdtemp()\n",
        "DATA_DIR = os.path.join(BASE_DIR, TAR_NAME, 'data')\n",
        "MODELS_DIR = os.path.join(BASE_DIR, TAR_NAME, 'models')\n",
        "SCHEMA = os.path.join(BASE_DIR, TAR_NAME, 'schema.pbtxt')\n",
        "OUTPUT_DIR = os.path.join(BASE_DIR, 'output')\n",
        "\n",
        "!curl -O https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/{TAR_NAME}.tar\n",
        "!tar xf {TAR_NAME}.tar\n",
        "!mv {TAR_NAME} {BASE_DIR}\n",
        "!rm {TAR_NAME}.tar\n",
        "\n",
        "print(\"Here's what we downloaded:\")\n",
        "!ls -R {BASE_DIR}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_xa7ZDV1MycO"
      },
      "source": [
        "## Parse the Schema\n",
        "\n",
        "Among the things we downloaded was a schema for our data that was created by [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/).  Let's parse that now so that we can use it with TFMA."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uW5eB4TPcwFw"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "from google.protobuf import text_format\n",
        "from tensorflow.python.lib.io import file_io\n",
        "from tensorflow_metadata.proto.v0 import schema_pb2\n",
        "from tensorflow.core.example import example_pb2\n",
        "\n",
        "schema = schema_pb2.Schema()\n",
        "contents = file_io.read_file_to_string(SCHEMA)\n",
        "schema = text_format.Parse(contents, schema)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UP3yuJxfNXRL"
      },
      "source": [
        "## Use the Schema to Create TFRecords\n",
        "\n",
        "We need to give TFMA access to our dataset, so let's create a TFRecords file.  We can use our schema to create it, since it gives us the correct type for each feature."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8-wud3fPczl6"
      },
      "outputs": [],
      "source": [
        "import csv\n",
        "\n",
        "datafile = os.path.join(DATA_DIR, 'eval', 'data.csv')\n",
        "reader = csv.DictReader(open(datafile, 'r'))\n",
        "examples = []\n",
        "for line in reader:\n",
        "  example = example_pb2.Example()\n",
        "  for feature in schema.feature:\n",
        "    key = feature.name\n",
        "    if feature.type == schema_pb2.FLOAT:\n",
        "      example.features.feature[key].float_list.value[:] = (\n",
        "          [float(line[key])] if len(line[key]) > 0 else [])\n",
        "    elif feature.type == schema_pb2.INT:\n",
        "      example.features.feature[key].int64_list.value[:] = (\n",
        "          [int(line[key])] if len(line[key]) > 0 else [])\n",
        "    elif feature.type == schema_pb2.BYTES:\n",
        "      example.features.feature[key].bytes_list.value[:] = (\n",
        "          [line[key].encode('utf8')] if len(line[key]) > 0 else [])\n",
        "  # Add a new column 'big_tipper' that indicates if tips was > 20% of the fare. \n",
        "  # TODO(b/157064428): Remove after label transformation is supported for Keras.\n",
        "  big_tipper = float(line['tips']) > float(line['fare']) * 0.2\n",
        "  example.features.feature['big_tipper'].float_list.value[:] = [big_tipper]\n",
        "  examples.append(example)\n",
        "\n",
        "tfrecord_file = os.path.join(BASE_DIR, 'train_data.rio')\n",
        "with tf.io.TFRecordWriter(tfrecord_file) as writer:\n",
        "  for example in examples:\n",
        "    writer.write(example.SerializeToString())\n",
        "\n",
        "!ls {tfrecord_file}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fp8Ub7GTXH3j"
      },
      "source": [
        "## Setup and Run TFMA\n",
        "\n",
        "TFMA supports a number of different model types including TF keras models, models based on generic TF2 signature APIs, as well TF estimator based models. The [get_started](https://www.tensorflow.org/tfx/model_analysis/get_started) guide has the full list of model types supported and any restrictions. For this example we are going to show how to configure a keras based model as well as an estimator based model that was saved as an [`EvalSavedModel`](https://www.tensorflow.org/tfx/model_analysis/eval_saved_model). See the [FAQ](https://www.tensorflow.org/tfx/model_analysis/faq) for examples of other configurations.\n",
        "\n",
        "TFMA provides support for calculating metrics that were used at training time (i.e. built-in metrics) as well metrics defined after the model was saved as part of the TFMA configuration settings. For our keras [setup](https://www.tensorflow.org/tfx/model_analysis/setup) we will demonstrate adding our metrics and plots manually as part of our configuration (see the [metrics](https://www.tensorflow.org/tfx/model_analysis/metrics) guide for information on the metrics and plots that are supported). For the estimator setup we will use the built-in metrics that were saved with the model. Our setups also include a number of slicing specs which are discussed in more detail in the following sections.\n",
        "\n",
        "After creating a [`tfma.EvalConfig`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalConfig) and [`tfma.EvalSharedModel`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalSharedModel) we can then run TFMA using [`tfma.run_model_analysis`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/run_model_analysis). This will create a [`tfma.EvalResult`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/EvalResult) which we can use later for rendering our metrics and plots."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qgC7NdCatT8y"
      },
      "source": [
        "### Keras"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PLJxcjpjfwkx"
      },
      "outputs": [],
      "source": [
        "import tensorflow_model_analysis as tfma\n",
        "\n",
        "# Setup tfma.EvalConfig settings\n",
        "keras_eval_config = text_format.Parse(\"\"\"\n",
        "  ## Model information\n",
        "  model_specs {\n",
        "    # For keras (and serving models) we need to add a `label_key`.\n",
        "    label_key: \"big_tipper\"\n",
        "  }\n",
        "\n",
        "  ## Post training metric information. These will be merged with any built-in\n",
        "  ## metrics from training.\n",
        "  metrics_specs {\n",
        "    metrics { class_name: \"ExampleCount\" }\n",
        "    metrics { class_name: \"BinaryAccuracy\" }\n",
        "    metrics { class_name: \"BinaryCrossentropy\" }\n",
        "    metrics { class_name: \"AUC\" }\n",
        "    metrics { class_name: \"AUCPrecisionRecall\" }\n",
        "    metrics { class_name: \"Precision\" }\n",
        "    metrics { class_name: \"Recall\" }\n",
        "    metrics { class_name: \"MeanLabel\" }\n",
        "    metrics { class_name: \"MeanPrediction\" }\n",
        "    metrics { class_name: \"Calibration\" }\n",
        "    metrics { class_name: \"CalibrationPlot\" }\n",
        "    metrics { class_name: \"ConfusionMatrixPlot\" }\n",
        "    # ... add additional metrics and plots ...\n",
        "  }\n",
        "\n",
        "  ## Slicing information\n",
        "  slicing_specs {}  # overall slice\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_day\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_values: {\n",
        "      key: \"trip_start_month\"\n",
        "      value: \"1\"\n",
        "    }\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\", \"trip_start_day\"]\n",
        "  }\n",
        "\"\"\", tfma.EvalConfig())\n",
        "\n",
        "# Create a tfma.EvalSharedModel that points at our keras model.\n",
        "keras_model_path = os.path.join(MODELS_DIR, 'keras', '2')\n",
        "keras_eval_shared_model = tfma.default_eval_shared_model(\n",
        "    eval_saved_model_path=keras_model_path,\n",
        "    eval_config=keras_eval_config)\n",
        "\n",
        "keras_output_path = os.path.join(OUTPUT_DIR, 'keras')\n",
        "\n",
        "# Run TFMA\n",
        "keras_eval_result = tfma.run_model_analysis(\n",
        "    eval_shared_model=keras_eval_shared_model,\n",
        "    eval_config=keras_eval_config,\n",
        "    data_location=tfrecord_file,\n",
        "    output_path=keras_output_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hMtoi_FpthQL"
      },
      "source": [
        "### Estimator"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6MJg42JVtjjj"
      },
      "outputs": [],
      "source": [
        "import tensorflow_model_analysis as tfma\n",
        "\n",
        "# Setup tfma.EvalConfig settings\n",
        "estimator_eval_config = text_format.Parse(\"\"\"\n",
        "  ## Model information\n",
        "  model_specs {\n",
        "    # To use EvalSavedModel set `signature_name` to \"eval\".\n",
        "    signature_name: \"eval\"\n",
        "  }\n",
        "\n",
        "  ## Post training metric information. These will be merged with any built-in\n",
        "  ## metrics from training.\n",
        "  metrics_specs {\n",
        "    metrics { class_name: \"ConfusionMatrixPlot\" }\n",
        "    # ... add additional metrics and plots ...\n",
        "  }\n",
        "\n",
        "  ## Slicing information\n",
        "  slicing_specs {}  # overall slice\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_day\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_values: {\n",
        "      key: \"trip_start_month\"\n",
        "      value: \"1\"\n",
        "    }\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\", \"trip_start_day\"]\n",
        "  }\n",
        "\"\"\", tfma.EvalConfig())\n",
        "\n",
        "# Create a tfma.EvalSharedModel that points at our eval saved model.\n",
        "estimator_base_model_path = os.path.join(\n",
        "    MODELS_DIR, 'estimator', 'eval_model_dir')\n",
        "estimator_model_path = os.path.join(\n",
        "    estimator_base_model_path, os.listdir(estimator_base_model_path)[0])\n",
        "estimator_eval_shared_model = tfma.default_eval_shared_model(\n",
        "    eval_saved_model_path=estimator_model_path,\n",
        "    eval_config=estimator_eval_config)\n",
        "\n",
        "estimator_output_path = os.path.join(OUTPUT_DIR, 'estimator')\n",
        "\n",
        "# Run TFMA\n",
        "estimator_eval_result = tfma.run_model_analysis(\n",
        "    eval_shared_model=estimator_eval_shared_model,\n",
        "    eval_config=estimator_eval_config,\n",
        "    data_location=tfrecord_file,\n",
        "    output_path=estimator_output_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A0khNBC9FlEO"
      },
      "source": [
        "## Visualizing Metrics and Plots\n",
        "\n",
        "Now that we've run the evaluation, let's take a look at our visualizations using TFMA. For the following examples, we will visualize the results from running the evaluation on the keras model. To view the estimator based model update the `eval_result` to point at our `estimator_eval_result` variable."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XFY0BqGtGkJ0"
      },
      "outputs": [],
      "source": [
        "eval_result = keras_eval_result\n",
        "# eval_result = estimator_eval_result"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cSl9qyTCbBKR"
      },
      "source": [
        "### Rendering Metrics\n",
        "\n",
        "To view metrics you use [`tfma.view.render_slicing_metrics`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/view/render_slicing_metrics)\n",
        "\n",
        "By default the views will display the `Overall` slice. To view a particular slice you can either use the name of the column (by setting `slicing_column`) or provide a [`tfma.SlicingSpec`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/SlicingSpec).\n",
        "\n",
        "The metrics visualization supports the following interactions:\n",
        "\n",
        "* Click and drag to pan\n",
        "* Scroll to zoom\n",
        "* Right click to reset the view\n",
        "* Hover over the desired data point to see more details.\n",
        "* Select from four different types of views using the selections at the bottom.\n",
        "\n",
        "For example, we'll be setting `slicing_column` to look at the `trip_start_hour` feature from our previous `slicing_specs`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hJ5_UMnWYmaE"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_slicing_metrics(eval_result, slicing_column='trip_start_hour')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LJuxvGCpn4yF"
      },
      "source": [
        "### Slices Overview\n",
        "\n",
        "The default visualization is the **Slices Overview** when the number of slices is small. It shows the values of metrics for each slice. Since we've selected `trip_start_hour` above, it's showing us metrics like accuracy and AUC for each hour, which allows us to look for issues that are specific to some hours and not others.\n",
        "\n",
        "In the visualization above:\n",
        "\n",
        "* Try sorting the feature column, which is our `trip_start_hours` feature, by clicking on the column header\n",
        "* Try sorting by precision, and **notice that the precision for some of the hours with examples is 0, which may indicate a problem**\n",
        "\n",
        "The chart also allows us to select and display different metrics in our slices.\n",
        "\n",
        "* Try selecting different metrics from the \"Show\" menu\n",
        "* Try selecting recall in the \"Show\" menu, and **notice that the recall for some of the hours with examples is 0, which may indicate a problem**\n",
        "\n",
        "It is also possible to set a threshold to filter out slices with smaller numbers of examples, or \"weights\".  You can type a minimum number of examples, or use the slider."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cQT-1Ckcnd_7"
      },
      "source": [
        "### Metrics Histogram\n",
        "\n",
        "This view also supports a **Metrics Histogram** as an alternative visualization, which is also the default view when the number of slices is large. The results will be divided into buckets and the number of slices / total weights / both can be visualized. Columns can be sorted by clicking on the column header.  Slices with small weights can be filtered out by setting the threshold. Further filtering can be applied by dragging the grey band. To reset the range, double click the band. Filtering can also be used to remove outliers in the visualization and the metrics tables. Click the gear icon to switch to a logarithmic scale instead of a linear scale.\n",
        "\n",
        "* Try selecting \"Metrics Histogram\" in the Visualization menu\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hSnqI6Esb1XM"
      },
      "source": [
        "### More Slices\n",
        "\n",
        "Our initial `tfma.EvalConfig` created a whole list of `slicing_specs`, which we can visualize by updating slice information passed to `tfma.view.render_slicing_metrics`. Here we'll select the `trip_start_day` slice (days of the week).  **Try changing the `trip_start_day` to `trip_start_month` and rendering again to examine different slices.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "355wqvY3yBod"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_slicing_metrics(eval_result, slicing_column='trip_start_day')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PsXM0NYGeajg"
      },
      "source": [
        "TFMA also supports creating feature crosses to analyze combinations of features. Our original settings created a cross `trip_start_hour` and `trip_start_day`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k7vbFS1Me1SH"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_slicing_metrics(\n",
        "    eval_result,\n",
        "    slicing_spec=tfma.SlicingSpec(\n",
        "        feature_keys=['trip_start_hour', 'trip_start_day']))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GmeODqrUfJw2"
      },
      "source": [
        "Crossing the two columns creates a lot of combinations!  Let's narrow down our cross to only look at **trips that start at noon**.  Then let's select `binary_accuracy` from the visualization:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kdvBNfcHfRWg"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_slicing_metrics(\n",
        "    eval_result,\n",
        "    slicing_spec=tfma.SlicingSpec(\n",
        "        feature_keys=['trip_start_day'], feature_values={'trip_start_hour': '12'}))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f8acksU33KMm"
      },
      "source": [
        "### Rendering Plots\n",
        "\n",
        "Any plots that were added to the `tfma.EvalConfig` as post training `metric_specs` can be displayed using [`tfma.view.render_plot`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/view/render_plot).\n",
        "\n",
        "As with metrics, plots can be viewed by slice. Unlike metrics, only plots for a particular slice value can be displayed so the `tfma.SlicingSpec` must be used and it must specify both a slice feature name and value. If no slice is provided then the plots for the `Overall` slice is used.\n",
        "\n",
        "In the example below we are displaying the `CalibrationPlot` and `ConfusionMatrixPlot` plots that were computed for the `trip_start_hour:1` slice."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X4TCKjGw3S-a"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_plot(\n",
        "    eval_result,\n",
        "    tfma.SlicingSpec(feature_values={'trip_start_hour': '1'}))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "meRvFkKcPbux"
      },
      "source": [
        "## Tracking Model Performance Over Time\n",
        "\n",
        "Your training dataset will be used for training your model, and will hopefully be representative of your test dataset and the data that will be sent to your model in production.  However, while the data in inference requests may remain the same as your training data, in many cases it will start to change enough so that the performance of your model will change.\n",
        "\n",
        "That means that you need to monitor and measure your model's performance on an ongoing basis, so that you can be aware of and react to changes.  Let's take a look at how TFMA can help.\n",
        "\n",
        "Let's load 3 different model runs and use TFMA to see how they compare using [`render_time_series`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/view/render_time_series)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zJYUOjmFfuPy"
      },
      "outputs": [],
      "source": [
        "# Note this re-uses the EvalConfig from the keras setup.\n",
        "\n",
        "# Run eval on each saved model\n",
        "output_paths = []\n",
        "for i in range(3):\n",
        "  # Create a tfma.EvalSharedModel that points at our saved model.\n",
        "  eval_shared_model = tfma.default_eval_shared_model(\n",
        "      eval_saved_model_path=os.path.join(MODELS_DIR, 'keras', str(i)),\n",
        "      eval_config=keras_eval_config)\n",
        "\n",
        "  output_path = os.path.join(OUTPUT_DIR, 'time_series', str(i))\n",
        "  output_paths.append(output_path)\n",
        "\n",
        "  # Run TFMA\n",
        "  tfma.run_model_analysis(eval_shared_model=eval_shared_model,\n",
        "                          eval_config=keras_eval_config,\n",
        "                          data_location=tfrecord_file,\n",
        "                          output_path=output_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RsO-gqCRK0ar"
      },
      "source": [
        "First, we'll imagine that we've trained and deployed our model yesterday, and now we want to see how it's doing on the new data coming in today.  The visualization will start by displaying AUC. From the UI you can:\n",
        "\n",
        "* Add other metrics using the \"Add metric series\" menu.\n",
        "* Close unwanted graphs by clicking on x\n",
        "* Hover over data points (the ends of line segments in the graph) to get more details\n",
        "\n",
        "Note: In the metric series charts the X axis is the model directory name of the model run that you're examining.  These names themselves are not meaningful."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KjEws8T0cDm9"
      },
      "outputs": [],
      "source": [
        "eval_results_from_disk = tfma.load_eval_results(output_paths[:2])\n",
        "\n",
        "tfma.view.render_time_series(eval_results_from_disk)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EQ7kZxESN9Bx"
      },
      "source": [
        "Now we'll imagine that another day has passed and we want to see how it's doing on the new data coming in today, compared to the previous two days:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VjQmlXMmLwHf"
      },
      "outputs": [],
      "source": [
        "eval_results_from_disk = tfma.load_eval_results(output_paths)\n",
        "\n",
        "tfma.view.render_time_series(eval_results_from_disk)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N1jpShgQxlVL"
      },
      "source": [
        "## Model Validation\n",
        "\n",
        "TFMA can be configured to evaluate multiple models at the same time. Typically this is done to compare a new model against a baseline (such as the currently serving model) to determine what the performance differences in metrics (e.g. AUC, etc) are relative to the baseline. When [thresholds](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/MetricThreshold) are configured, TFMA will produce a [`tfma.ValidationResult`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/ValidationResult) record indicating whether the performance matches expecations.\n",
        "\n",
        "Let's re-configure our keras evaluation to compare two models: a candidate and a baseline. We will also validate the candidate's performance against the baseline by setting a [`tmfa.MetricThreshold`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/MetricThreshold) on the AUC metric."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kkatdR6Y1-4G"
      },
      "outputs": [],
      "source": [
        "# Setup tfma.EvalConfig setting\n",
        "eval_config_with_thresholds = text_format.Parse(\"\"\"\n",
        "  ## Model information\n",
        "  model_specs {\n",
        "    name: \"candidate\"\n",
        "    # For keras we need to add a `label_key`.\n",
        "    label_key: \"big_tipper\"\n",
        "  }\n",
        "  model_specs {\n",
        "    name: \"baseline\"\n",
        "    # For keras we need to add a `label_key`.\n",
        "    label_key: \"big_tipper\"\n",
        "    is_baseline: true\n",
        "  }\n",
        "\n",
        "  ## Post training metric information\n",
        "  metrics_specs {\n",
        "    metrics { class_name: \"ExampleCount\" }\n",
        "    metrics { class_name: \"BinaryAccuracy\" }\n",
        "    metrics { class_name: \"BinaryCrossentropy\" }\n",
        "    metrics {\n",
        "      class_name: \"AUC\"\n",
        "      threshold {\n",
        "        # Ensure that AUC is always > 0.9\n",
        "        value_threshold {\n",
        "          lower_bound { value: 0.9 }\n",
        "        }\n",
        "        # Ensure that AUC does not drop by more than a small epsilon\n",
        "        # e.g. (candidate - baseline) > -1e-10 or candidate > baseline - 1e-10\n",
        "        change_threshold {\n",
        "          direction: HIGHER_IS_BETTER\n",
        "          absolute { value: -1e-10 }\n",
        "        }\n",
        "      }\n",
        "    }\n",
        "    metrics { class_name: \"AUCPrecisionRecall\" }\n",
        "    metrics { class_name: \"Precision\" }\n",
        "    metrics { class_name: \"Recall\" }\n",
        "    metrics { class_name: \"MeanLabel\" }\n",
        "    metrics { class_name: \"MeanPrediction\" }\n",
        "    metrics { class_name: \"Calibration\" }\n",
        "    metrics { class_name: \"CalibrationPlot\" }\n",
        "    metrics { class_name: \"ConfusionMatrixPlot\" }\n",
        "    # ... add additional metrics and plots ...\n",
        "  }\n",
        "\n",
        "  ## Slicing information\n",
        "  slicing_specs {}  # overall slice\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_day\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_month\"]\n",
        "  }\n",
        "  slicing_specs {\n",
        "    feature_keys: [\"trip_start_hour\", \"trip_start_day\"]\n",
        "  }\n",
        "\"\"\", tfma.EvalConfig())\n",
        "\n",
        "# Create tfma.EvalSharedModels that point at our keras models.\n",
        "candidate_model_path = os.path.join(MODELS_DIR, 'keras', '2')\n",
        "baseline_model_path = os.path.join(MODELS_DIR, 'keras', '1')\n",
        "eval_shared_models = [\n",
        "  tfma.default_eval_shared_model(\n",
        "      model_name=tfma.CANDIDATE_KEY,\n",
        "      eval_saved_model_path=candidate_model_path,\n",
        "      eval_config=eval_config_with_thresholds),\n",
        "  tfma.default_eval_shared_model(\n",
        "      model_name=tfma.BASELINE_KEY,\n",
        "      eval_saved_model_path=baseline_model_path,\n",
        "      eval_config=eval_config_with_thresholds),\n",
        "]\n",
        "\n",
        "validation_output_path = os.path.join(OUTPUT_DIR, 'validation')\n",
        "\n",
        "# Run TFMA\n",
        "eval_result_with_validation = tfma.run_model_analysis(\n",
        "    eval_shared_models,\n",
        "    eval_config=eval_config_with_thresholds,\n",
        "    data_location=tfrecord_file,\n",
        "    output_path=validation_output_path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "siF6npd3IfJq"
      },
      "source": [
        "When running evaluations with one or more models against a baseline, TFMA automatically adds diff metrics for all the metrics computed during the evaluation. These metrics are named after the corresponding metric but with `_diff` appended to the metric name.\n",
        "\n",
        "Let's take a look at the metrics produced by our run:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yGIw9TDuJ7wn"
      },
      "outputs": [],
      "source": [
        "tfma.view.render_time_series(eval_result_with_validation)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JIsehm_V4oKU"
      },
      "source": [
        "Now let's look at the output from our validation checks. To view the validation results we use [`tfma.load_validator_result`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/load_validation_result). For our example, the validation fails because AUC is below the threshold."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "48EdSTUW5eE1"
      },
      "outputs": [],
      "source": [
        "validation_result = tfma.load_validation_result(validation_output_path)\n",
        "print(validation_result.validation_ok)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tghWegsjhpkt"
      },
      "source": [
        "# Copyright &copy; 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "rSGJWC5biBiG"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvsmelXGasty"
      },
      "source": [
        "Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "tfma_basic.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
